Part A consists of analysis of FAA and BTS data about airport operations, including number of commuters, enplanements, specific flight information over time, and much more. While it is very hard to define ‘winning’ and ‘losing’ in aviation, this section aims to approach such examination through three lenses – 1) Most recent operation data (Best present performance as winning, please refer to section A1-1), 2) Increasing commuters and enplanements over years (Section A1-2), and 3) Less delays or cancellation as winning (Section A2). This section also calls highlight to two topic of my personal interest – COVID’s impact on airport (Section A1-3) and Air Carrier’s delay stats (Section A2-2).
It is important to acknowledge that not only aviation hubs of different sizes are hard to compare but also each and every airport is unique, in its design, operation, and population it intends to serve. Hence, some data analysis are longitudinal to avoid such problem of incommensurability. Additionally, despite some analysis covering smaller airports, the majority of this part focuses on large aviation hubs that has more flight data to work with.
Data Sources:
Federal Aviation Administration http://aspm.faa.gov/main/taf.asp
BTS data accessed through ‘skynet’ package
The first approach to examine winning and losing airport is through their present performance. The most recent accessible data is from 2023, hence, the following plots show commuters and enplanements of large and small aviation hubs in 2023. Top 10 in each category is highlight with color.
Interestingly, the Top 10 for commuters and enplanements are quite different for large airports. For instance, Atlanta International Airport (ATL) has the most enplanements, yet not as much commuters comparing to other hubs. At the same time, Ronald Regan Washington International Airport (DCA) is seeing the opposite trend. Through comparing the two charts, some potential candidates for winning airports are O’Hare Int’l (ORD), Dallas Fort Worth (DFW), Denver Int’l (DEN), and Charlotte Douglas Int’l (CLT). For cities like New York City that is served by three airports – commuters and flights seem to be spread out across the three large hubs LGA, JFK, and EWR, making them under less pressure.
# Find top 10 in Commuters
Top_10_Comm_L <- large_hub_commuter[order(-large_hub_commuter$`2023`), ]
Top_10_Comm_L <- head(Top_10_Comm_L, 10)
large_hub_commuter <- large_hub_commuter %>%
mutate(top_10 = ifelse(LOC_ID %in% Top_10_Comm_L$LOC_ID, "Yes", "No"))
fig_commute_L <- plot_ly(large_hub_commuter, x=~LOC_ID, y=~`2023`, type='bar',
color=~top_10, colors=c("#d2d2cb","#594a40") ) %>%
layout(title = "Total Commuters in 2023 (Large Airports)",
xaxis = list(title = "Airports"),
yaxis = list(title = "Number of Commuters"),
showlegend=FALSE, margin=0.01)
fig_commute_L
# Find top 10 in Enplanements
Top_10_ENPL_L <- large_hub_ENPL[order(-large_hub_ENPL$`2023`), ]
Top_10_ENPL_L <- head(Top_10_ENPL_L, 10)
large_hub_ENPL <- large_hub_ENPL %>%
mutate(top_10 = ifelse(LOC_ID %in% Top_10_ENPL_L$LOC_ID, "Yes", "No"))
fig_ENPL_L <- plot_ly(large_hub_ENPL, x=~LOC_ID, y=~`2023`, type='bar',
color=~top_10, colors=c("#d2d2cb","#81947a") ) %>%
layout(title = "Total Enplanements in 2023 (Large Airports)",
xaxis = list(title = "Airports"),
yaxis = list(title = "Number of Commuters"),
showlegend=FALSE, margin=0.5)
fig_ENPL_L
Small airports show a more consistent result between commuters and enplanements. Some top performers are Louisville Int’l (SDF), Spokane Int’l (GEG), Buffalo Niagara Int’l (BUF), Norfolk Int’l (ORF), and Will Rogers Airport in Oklahaoma City (OKC). Several of them are serving as important regional hubs that fill in gaps among larger hubs.
# Find top 10 in Commuters
Top_10_Comm_S <- small_hub_commuter[order(-small_hub_commuter$`2023`), ]
Top_10_Comm_S <- head(Top_10_Comm_S, 10)
small_hub_commuter <- small_hub_commuter %>%
mutate(top_10 = ifelse(LOC_ID %in% Top_10_Comm_S$LOC_ID, "Yes", "No"))
fig_commute_S <- plot_ly(small_hub_commuter, x=~LOC_ID, y=~`2023`, type='bar',
color=~top_10, colors=c("#d2d2cb","#594a40") ) %>%
layout(title = "Total Commuters in 2023 (Small Airports)",
xaxis = list(title = "Airports"),
yaxis = list(title = "Number of Commuters"),
showlegend=FALSE, margin=0.5)
fig_commute_S
# Find top 10 in Commuters
Top_10_ENPL_S <- small_hub_ENPL[order(-small_hub_ENPL$`2023`), ]
Top_10_ENPL_S <- head(Top_10_ENPL_S, 10)
small_hub_ENPL <- small_hub_ENPL %>%
mutate(top_10 = ifelse(LOC_ID %in% Top_10_ENPL_S$LOC_ID, "Yes", "No"))
fig_ENPL_S <- plot_ly(small_hub_ENPL, x=~LOC_ID, y=~`2023`, type='bar',
color=~top_10, colors=c("#d2d2cb","#81947a") ) %>%
layout(title = "Total Enplanements in 2023 (Small Airports)",
xaxis = list(title = "Airports"),
yaxis = list(title = "Number of Commuters"),
showlegend=FALSE, margin=0.5)
fig_ENPL_S
The second approach is to track the change in commuters and enplanements over the past years to see whether the airports have been attracting travelers and airlines consistently over the years.The analysis here compares data from 2013 and 2023 to make a longitudinal case over the past 10 years. The graphs indicates percent change each airport has encountered in this period. Potentially due to the impact of COVID, the majority of both large and small hubs are experiencing significant downfall in commuters. Very few ones - including JFK, BOS, MCO (Orkando), SAN (San Diego), and TPA (Tampa) - are seeing an increase, and can be potentially categorized as winners. On the other hand, airports like Honolulu (HNL), Baltimore (BWI), ATL (Atlanta), Dallas (DTW), and Philadelphia (PHL) are potential losers if applying the same criteria. Interestingly, none of them was ranked top 10 when it comes to absolute number of commuters (as shown in the previous section), but they seem to be doing a better job of keeping existing and expanding their operations over the past 10 years.
In stark contrast to commuters data, almost all large and small airports are witnessing an increase in Enplanements over the past 10 years. Austin (AUS) and Nashville (BNA), for instance, has doubled enplanements in 2023 in comparison to 2013. Among small airports, Sarasota Bradenton International in Florida (SRQ) has more than 250% increase in enplanements.
Combining commuters and enplanements data, we may infer that potentially with the impact of COVID, international travels have been in low spirits, but with the recovery in the past 2 years, there are potentially more domestic flights that carries less commuters per flight. Air carriers may also have adjusted their operating strategies to focus on smaller aircrafts for their fleet. Given the huge impact of COVID on aviation, it is worth looking at how airports manage and recover from the COVID times, which will be presented in the next section.
# Large Hub
large_hub_commuter <- large_hub_commuter %>% mutate(change_comm = (`2023` - `2013`)/`2013`*100)
large_hub_ENPL <- large_hub_ENPL %>% mutate(change_enpl = (`2023` - `2013`)/`2013`*100)
fig_commute_L_change <- plot_ly(large_hub_commuter, x=~LOC_ID, y=~change_comm, type='bar',
color=~top_10, colors=c("#d2d2cb","#594a40") ) %>%
layout(title = "Change in Commuters 2013 - 2023 (Large Airports)",
xaxis = list(title = "Airports"),
yaxis = list(title = "Percent Change in Commuters"),
showlegend=FALSE, margin=0.5)
fig_commute_L_change
fig_ENPL_L_change <- plot_ly(large_hub_ENPL, x=~LOC_ID, y=~change_enpl, type='bar',
color=~top_10, colors=c("#d2d2cb","#81947a") ) %>%
layout(title = "Change in Enplanements 2013 - 2023 (Large Airports)",
xaxis = list(title = "Airports"),
yaxis = list(title = "Percent Change in Enplanements"),
showlegend=FALSE, margin=0.5)
fig_ENPL_L_change
# Small Hub
small_hub_commuter <- small_hub_commuter %>% mutate(change_comm = (`2023` - `2013`)/`2013`*100)
small_hub_ENPL <- small_hub_ENPL %>% mutate(change_enpl = (`2023` - `2013`)/`2013`*100)
fig_commute_S_change <- plot_ly(small_hub_commuter, x=~LOC_ID, y=~change_comm, type='bar',
color=~top_10, colors=c("#d2d2cb","#594a40") ) %>%
layout(title = "Change in Commuters 2013 - 2023 (Small Airports)",
xaxis = list(title = "Airports"),
yaxis = list(title = "Percent Change in Commuters"),
showlegend=FALSE, margin=0.5)
fig_commute_S_change
fig_ENPL_S_change <- plot_ly(small_hub_ENPL, x=~LOC_ID, y=~change_enpl, type='bar',
color=~top_10, colors=c("#d2d2cb","#81947a") ) %>%
layout(title = "Change in Enplanements 2013 - 2023 (Small Airports)",
xaxis = list(title = "Airports"),
yaxis = list(title = "Percent Change in Enplanements"),
showlegend=FALSE, margin=0.5)
fig_ENPL_S_change
This section highlights the performance of 5 winners and 5 losers identified in the previous section over the COVID time. The figure below shows the annual percent change at each airport from 2019 to 2023. All 10 airports encountered a huge drop in commuters at the beginning of COVID in 2020. Since then, we are seeing that most winning airports have successfully recovered a large number of commuters either in 2021 (such as Tampa and Orlando) or 2022 (such as Boston and JFK), while the losing airports are still suffering from aftermath of COVID and fails to come back. Honolulu, for example, is a world-famous vacation destination, with heavy reliance on international visitors coming through air, and therefore was impacted the most. With travel bans, health concerns, and sluggish economy post-COVID, it is experiencing a hard time to recover to the pre-COVID level. How non-contiguous U.S. Airports like Honolulu are managing financially over the COVID years is a topic of interest, and will be covered in Part B.
COVID_commuter <- large_hub_commuter %>% select(LOC_ID,`2019`, `2020`, `2021`, `2022`, `2023`)
COVID_commuter <- COVID_commuter %>% filter(LOC_ID %in% c('JFK ', 'BOS ', 'MCO ', 'SAN ', 'TPA ', 'HNL ', 'BWI ', 'ATL ', 'DTW ', 'PHL '))
COVID_commuter<- COVID_commuter %>% mutate(`19_20` = (`2020` - `2019`) / `2019`,
`20_21` = (`2021` - `2020`) / `2020`,
`21_22` = (`2022` - `2021`) / `2021`,
`22_23` = (`2023` - `2022`) / `2022`,)
COVID_commuter_Change <- COVID_commuter %>% select(LOC_ID, `19_20`, `20_21`, `21_22`, `22_23`) %>%
pivot_longer(cols = !LOC_ID, names_to = "Year", values_to = "Percent Change") %>%
pivot_wider(names_from = "LOC_ID", values_from= "Percent Change")
fig_COVID <- plot_ly(COVID_commuter_Change,x = ~Year, y = ~`ATL `, name = 'Atlanta (L)', type="scatter", mode = 'lines+markers',
line = list(color = '#DE9E6B', width=1), marker = list(color='#DE9E6B')) %>%
layout(xaxis = list(title='Year Range'), yaxis = list(title='Change in Commuters (%)'), title = list(text='COVID Recovery - Percent Change in Commuters 2019 - 2023'), margin=0.1) %>%
add_trace(y = ~`DTW `, name = 'Detroit (L)', line = list(color = '#F0B23E'), marker = list(color = '#F0B23E')) %>%
add_trace(y = ~`BWI `, name = 'Baltimore (L)', line = list(color = '#e3c2a0'), marker = list(color = '#e3c2a0')) %>%
add_trace(y = ~`PHL `, name = 'Philadelphia (L)', line = list(color = '#E0D7C8'), marker = list(color = '#E0D7C8')) %>%
add_trace(y = ~`HNL `, name = 'Honolulu (L)', line = list(color = '#EDDDBC'), marker = list(color = '#EDDDBC')) %>%
add_trace(y = ~`SAN `, name = 'San Diego (W)', line = list(color = '#B3BE76', dash = 'dash'), marker = list(color = '#B3BE76')) %>%
add_trace(y = ~`TPA `, name = 'Tampa (W)', line = list(color = '#9EA959', dash = 'dash'), marker = list(color = '#9EA959')) %>%
add_trace(y = ~`BOS `, name = 'Boston (W)', line = list(color = '#76822C', dash = 'dash'), marker = list(color = '#76822C')) %>%
add_trace(y = ~`MCO `, name = 'Orlando (W)', line = list(color = '#4F651A', dash = 'dash'), marker = list(color = '#4F651A')) %>%
add_trace(y = ~`JFK `, name = 'JFK (W)', line = list(color = '#2C3A16', dash = 'dash'), marker = list(color = '#2C3A16'))
fig_COVID
From the airport operations perspective, while number of commuters and enplanements (and the revenue they brings) are important, efficiency and traveler satisfaction are also critical factors that contributes an airport’s ‘soft power’ and reputation, which will implicitly impact their performance in the future (i.e., whether travelers and/or air carriers choose a specific airport over the competitors.) Delay is a critical indicator that impacts such ‘soft power’.
This section continues to focus on the same set of 5 winning (JFK, Orlando, Boston, Tampa, San Diego) and 5 losing airports (Atlanta, Dallas, Baltimore, Philadelphia, Honolulu) and explores delays, cancellation, and diversion of flights at those airports. For comparison purpose, this analysis focuses on two important travel seasons – Christmas and New Years (December) and summer (July). BTS data of 2022 December (as 2023 December data is not yet available) and 2023 July are accessed through skynet package in R.
LOC_ID <- c('JFK', 'BOS', 'MCO', 'SAN', 'TPA', 'HNL', 'BWI', 'ATL', 'DTW', 'PHL')
total_dep_flights <- delay_23_7 %>% group_by(origin) %>% count()
total_arr_flights <- delay_23_7 %>% group_by(dest) %>% count()
#departure delay
dep_delay_23_7 <- delay_23_7[!is.na(delay_23_7$dep_delay),]
dep_delay_23_7 <- dep_delay_23_7 %>% group_by(origin) %>% summarize(Depature_Delay_23_Jul = mean(dep_delay))
dep_delay_23_7 <- dep_delay_23_7 %>% filter(origin %in% c('JFK', 'BOS', 'MCO', 'SAN', 'TPA', 'HNL', 'BWI', 'ATL', 'DTW', 'PHL'))
#arrival delay
arr_delay_23_7 <- delay_23_7[!is.na(delay_23_7$arr_delay),]
arr_delay_23_7 <- arr_delay_23_7%>% group_by(dest) %>% summarize(Arrival_Delay_23_Jul = mean(arr_delay))
arr_delay_23_7 <- arr_delay_23_7 %>% filter(dest %in% c('JFK', 'BOS', 'MCO', 'SAN', 'TPA', 'HNL', 'BWI', 'ATL', 'DTW', 'PHL')) %>%
mutate(origin = dest)
#no. of cancel (ORG)
cancel_23_7 <- delay_23_7[!is.na(delay_23_7$cancelled),] %>% select(origin, cancelled)
cancel_23_7 <- cancel_23_7 %>% group_by(origin) %>% summarize(cancel_23_Jul = sum(cancelled==1))
cancel_23_7 <- cancel_23_7 %>% filter(origin %in% c('JFK', 'BOS', 'MCO', 'SAN', 'TPA', 'HNL', 'BWI', 'ATL', 'DTW', 'PHL'))
#no. of divert (ORG)
divert_23_7 <- delay_23_7[!is.na(delay_23_7$diverted),]
divert_23_7 <- divert_23_7 %>% group_by(origin) %>% summarize(divert_23_Jul = sum(diverted==1))
divert_23_7 <- divert_23_7 %>% filter(origin %in% c('JFK', 'BOS', 'MCO', 'SAN', 'TPA', 'HNL', 'BWI', 'ATL', 'DTW', 'PHL'))
#departure delay
dep_delay_22_12 <- delay_22_12[!is.na(delay_22_12$dep_delay),]
dep_delay_22_12 <- dep_delay_22_12 %>% group_by(origin) %>% summarize(Depature_Delay_22_Dec = mean(dep_delay))
dep_delay_22_12 <- dep_delay_22_12 %>% filter(origin %in% c('JFK', 'BOS', 'MCO', 'SAN', 'TPA', 'HNL', 'BWI', 'ATL', 'DTW', 'PHL'))
#arrival delay
arr_delay_22_12 <- delay_22_12[!is.na(delay_22_12$arr_delay),]
arr_delay_22_12 <- arr_delay_22_12%>% group_by(dest) %>% summarize(Arrival_Delay_22_Dec = mean(arr_delay))
arr_delay_22_12 <- arr_delay_22_12 %>% filter(dest %in% c('JFK', 'BOS', 'MCO', 'SAN', 'TPA', 'HNL', 'BWI', 'ATL', 'DTW', 'PHL')) %>%
mutate(origin = dest)
#no. of cancel (ORG)
cancel_22_12 <- delay_22_12[!is.na(delay_22_12$cancelled),] %>% select(origin, cancelled)
cancel_22_12 <- cancel_22_12 %>% group_by(origin) %>% summarize(cancel_22_Dec = sum(cancelled==1))
cancel_22_12 <- cancel_22_12 %>% filter(origin %in% c('JFK', 'BOS', 'MCO', 'SAN', 'TPA', 'HNL', 'BWI', 'ATL', 'DTW', 'PHL'))
#no. of divert (ORG)
divert_22_12 <- delay_22_12[!is.na(delay_22_12$diverted),]
divert_22_12 <- divert_22_12 %>% group_by(origin) %>% summarize(divert_22_Dec = sum(diverted==1))
divert_22_12 <- divert_22_12 %>% filter(origin %in% c('JFK', 'BOS', 'MCO', 'SAN', 'TPA', 'HNL', 'BWI', 'ATL', 'DTW', 'PHL'))
delay_comp <- left_join(dep_delay_23_7, arr_delay_23_7, by='origin') %>%
left_join(dep_delay_22_12, by='origin') %>%
left_join(arr_delay_22_12, by='origin') %>% select(-dest.x, -dest.y)
cancel_divert_comp <- left_join(cancel_23_7, divert_23_7, by='origin') %>%
left_join(cancel_22_12, by='origin') %>%
left_join(divert_22_12, by='origin')
delay_comp <- delay_comp %>% pivot_longer(cols = !'origin', names_to = "Variables", values_to = "Time")
cancel_divert_comp <- cancel_divert_comp %>% pivot_longer(cols = !'origin', names_to = "Variables", values_to = "Number")
delay_heat <- ggplot(delay_comp, aes(x = origin, y = Variables, fill = Time)) +
geom_tile(color = "white") + # Add tiles for heatmap
scale_fill_gradient(low = "#d2d2cb", high = "#8e7967") + # Gradient fill colors
labs(title = "Delay Heatmap (July 2023 vs. December 2022)") +
theme_minimal()
delay_heat
The first heat map reflects the delays at each airport. Departure delay and arrival delays are mapped separately - and we may assume departure delay has more to do with the airports’ operations while arrival delays are less under control. The heat map reveals most airports plotted are suffering from more delays in July as oppose to December, potentially due to busy travel season. An outlier would be Honolulu, which seems to be seeing more delays in winter, as it is a hot travel spot. Among all, hubs such as Orlando Int’l (MCO), JFK, and BOS have significantly longer average delay time than many other airports. Transit hubs such as Atlanta (ATL) and Dallas Fort Worth (DFW) are doing a relatively good job keeping down the delay time.
cancel_divert_heat <- ggplot(cancel_divert_comp, aes(x = origin, y = Variables, fill = Number)) +
geom_tile(color = "white") + # Add tiles for heatmap
scale_fill_gradient(low = "#d2d2cb", high = "#81947a") + # Gradient fill colors
labs(title = "Cancellation & Divert Heatmap (July 2023 vs. December 2022)") +
theme_minimal()
cancel_divert_heat
The second heat map shows the number of cancellations and diversions. Overall, cancellation is more common than diversion, and for most airports plotted here, winter tends to be the time with more cancellation, potentially due to weather conditions. JFK and Boston, however, sees a lot more cancellations in July. It is intriguing to find each airport, despite their similarity in commuters and enplanement stats, reacts and performs quite differently to seasonality, delays, and cancellations.
BTS data on flight stats reveals whether a delay is due to the carriers, weather, NAS (National Airspace System), or security issues. Out of the four, delay due to carrier is the most common and usually takes up the longest time. Hence, this section hopes to highlight whether different operating carriers have different average delay time during peak travel seasons - summer and Christmas. Here, 2023 July and 2022 December data are chosen to show this comparison.
While air carriers can vary a lot in their size of business, no. of flights, capacity, and operational strategies, the two graphs below show some interesting trends across and among carriers. For larger carriers, United seems to be performing consistent while Jetblue, Delta, and AA all shows a longer average delay during summer travel season. Smaller regional carriers such as Mesa Airlines (as a part of United) and Horizon Air (as a part of Alaska Air) are operating seasonally in winter. Such result, however, only provides an overview – reasons behind carrier delay can vary a lot. The only thing for sure is, delay is a critical factor when it comes to customer satisfaction.
carrier_delay_23_7 <- on_time_23_7 %>% select(op_carrier, carrier_delay) %>% drop_na()
carrier_delay_23_7 <- carrier_delay_23_7 %>% group_by(op_carrier) %>% summarize(mean_delay = mean(carrier_delay), median_delay = median(carrier_delay))
carrier_delay_23_7 <- carrier_delay_23_7 %>%
mutate(carrier_full_name = case_when(
op_carrier == 'WN' ~ "WN - Southwest Airlines",
op_carrier == 'YX' ~ "YX - Republic Airways",
op_carrier == 'UA' ~ "UA - United Airlines",
op_carrier == 'OO' ~ "OO - SkyWest Airlines",
op_carrier == 'OH' ~ "OH - PSA Airlines",
op_carrier == 'NK' ~ "NK - Spirit Airlines",
op_carrier == 'MQ' ~ "MQ - Envoy Air",
op_carrier == 'HA' ~ "HA - Hawaiian Airlines",
op_carrier == 'G4' ~ "G4 - Allegiant Air",
op_carrier == 'F9' ~ "F9 - Frontier Airlines",
op_carrier == 'DL' ~ "DL - Delta Air Lines",
op_carrier == 'B6' ~ "B6 - Jetblue Airways",
op_carrier == 'AS' ~ "AS - Alaska Airlines",
op_carrier == 'AA' ~ "AA - American Airlines",
op_carrier == '9E' ~ "9E - Endeavor Air",
))
carrier_delay_22_12 <- on_time_22_12 %>% select(op_carrier, carrier_delay) %>% drop_na()
carrier_delay_22_12 <- carrier_delay_22_12 %>% group_by(op_carrier) %>% summarize(mean_delay = mean(carrier_delay), median_delay = median(carrier_delay))
carrier_delay_22_12 <- carrier_delay_22_12 %>%
mutate(carrier_full_name = case_when(
op_carrier == 'WN' ~ "WN - Southwest Airlines",
op_carrier == 'YX' ~ "YX - Republic Airways",
op_carrier == 'YV' ~ "YV - Mesa Airlines",
op_carrier == 'UA' ~ "UA - United Airlines",
op_carrier == 'OO' ~ "OO - SkyWest Airlines",
op_carrier == 'OH' ~ "OH - PSA Airlines",
op_carrier == 'NK' ~ "NK - Spirit Airlines",
op_carrier == 'QX' ~ "QX - Horizon Air",
op_carrier == 'MQ' ~ "MQ - Envoy Air",
op_carrier == 'HA' ~ "HA - Hawaiian Airlines",
op_carrier == 'G4' ~ "G4 - Allegiant Air",
op_carrier == 'F9' ~ "F9 - Frontier Airlines",
op_carrier == 'DL' ~ "DL - Delta Air Lines",
op_carrier == 'B6' ~ "B6 - Jetblue Airways",
op_carrier == 'AS' ~ "AS - Alaska Airlines",
op_carrier == 'AA' ~ "AA - American Airlines",
op_carrier == '9E' ~ "9E - Endeavor Air",
))
carrier_delay <- ggplot(carrier_delay_23_7) +
geom_col(aes(mean_delay, op_carrier), fill=colors_carrier)
carrier_delay <- carrier_delay +
scale_x_continuous(
limits = c(0, 45),
breaks = seq(0, 45, by = 5),
expand = c(0, 0), # The horizontal axis does not extend to either side
position = "top" # Labels are located on the top
) +
scale_y_discrete(expand = expansion(add = c(0, 0.5))) +
theme(
# Set background color to white
panel.background = element_rect(fill = "white"),
# Set the color and the width of the grid lines for the horizontal axis
panel.grid.major.x = element_line(color = "#A8BAC4", size = 0.3),
# Remove tick marks by setting their length to 0
axis.ticks.length = unit(0, "mm"),
# Remove the title for both axes
axis.title = element_blank(),
# Only left line of the vertical axis is painted in black
axis.line.y.left = element_line(color = "black"),
# Remove labels from the vertical axis
axis.text.y = element_blank(),
# But customize labels for the horizontal axis
axis.text.x = element_text(family = "Tahoma", size = 5)
)
carrier_delay <- carrier_delay +
geom_shadowtext(data=carrier_delay_23_7, aes(0, y=op_carrier, label = paste(carrier_full_name, round(mean_delay, 2), sep = " : ")),
hjust = 0,
nudge_x = 0.3,
bg.colour = NA,
family = "Tahoma",
size=3.7,
colour=colors_carrier_font,
linewidth=0.1) +
labs(title = "Average Delay Time by Operating Air Carrier (mins) [July 2023]") +
theme(plot.title = element_text(family="Tahoma", size=15, margin = margin(10, 0, 10, 0)))
carrier_delay
carrier_delay_22 <- ggplot(carrier_delay_22_12) +
geom_col(aes(mean_delay, op_carrier), fill=colors_carrier_add)
carrier_delay_22 <- carrier_delay_22 +
scale_x_continuous(
limits = c(0, 45),
breaks = seq(0, 45, by = 5),
expand = c(0, 0), # The horizontal axis does not extend to either side
position = "top" # Labels are located on the top
) +
scale_y_discrete(expand = expansion(add = c(0, 0.5))) +
theme(
# Set background color to white
panel.background = element_rect(fill = "white"),
# Set the color and the width of the grid lines for the horizontal axis
panel.grid.major.x = element_line(color = "#A8BAC4", size = 0.3),
# Remove tick marks by setting their length to 0
axis.ticks.length = unit(0, "mm"),
# Remove the title for both axes
axis.title = element_blank(),
# Only left line of the vertical axis is painted in black
axis.line.y.left = element_line(color = "black"),
# Remove labels from the vertical axis
axis.text.y = element_blank(),
# But customize labels for the horizontal axis
axis.text.x = element_text(family = "Tahoma", size = 5)
)
carrier_delay_22 <- carrier_delay_22 +
geom_shadowtext(data=carrier_delay_22_12, aes(0, y=op_carrier, label = paste(carrier_full_name, round(mean_delay, 2), sep = " : ")),
hjust = 0,
nudge_x = 0.3,
bg.colour = NA,
family = "Tahoma",
size=4,
colour=colors_carrier_font_add,
linewidth=0.1) +
labs(title = "Average Delay Time by Operating Air Carrier (mins) [Dec 2022]") +
theme(plot.title = element_text(family="Tahoma", size=15, margin = margin(10, 0, 10, 0)))
carrier_delay_22
Prompt: What patterns in finance and budget emerge across time for airports with shared traits? Airports vary widely- some are major shipping nodes, others have exceptionally high tourist traffic, and so on. Note that the research question stated above could easily be explored for much longer than the two weeks available. As such, the assignment should be treated as an initial exploration of data that may point to, but not fully dive into, deeper lines of inquiry.
Several major airports on non-contiguous U.S. territories are selected for this section, which means they are relatively isolated from most of the U.S. States and they usually heavily rely on air transportation to build connections, transporting resources, and bring in visitors. In particular, I am curious to learn more about Ted Stevens Anchorage International Airport in Alaska, Daniel K. Inouye International Airport in Honolulu, Antonio B. Won Pat International Airport in Guam, and Luis Munoz Marin International Airport in Puerto Rico.
The four airports of choice are categorized as different size hubs by FAA – Honolulu Airport being a large, PR and AK being medium, and Guam being small. It is interesting to compare different aspects of those airports’ finance given their similar reliance on air transportation but at different capacities.
Given the most recent data of 2023 is not yet available, the time frame for this analysis is 2019 - 2022, in hope to unveil trends over the COVID years.
Data Source: CATS (Certification Activity Tracking System) reports from FAA https://cats.airports.faa.gov/
Airport’s operational revenue usually consists of three components - aeronautical from passenger and airlines, aeronautical from non-passenger operations, and non-aeronautical revenues. From examining the op revenue breakdown of each airport, it becomes clear that they have different revenue-generating strategies.
A common understanding or inference may be that aeronautical revenue from passenger airlines contributing to the majority of operational revenue. However, the following chart shows it is not always the case for every airport.
Anchorage in Alaska, earns most from non-passenger aeronautical operations, pointing towards its significant cargo businesses. Guam, Honolulu, and Puerto Rico, on the other hand, has significant amount of revenue coming from passenger and airlines, and Honolulu specifically has a large share coming from non-aeronautical operations, such as retail, duty free, and other commercial practices at the airport. This aligns with their positions as popular tourist destinations.
The longitudinal comparison also clearly shows the different levels of COVID impact in different types of revenue. Non-passenger aeronautical revenue remains relatively intact, while non-aeronautical and partial passenger airline aeronautical revenue suffered a lot, potentially from social distancing and restrictions on public spaces that disrupts commercial activities at airport and discourages people from traveling for leisure purposes.
# Select Revenue Variables
Finance_Selected <- Finance_Data %>% select("LOC_ID", "YEAR", "Total.Passenger.Airline.Aeronautical.Revenue", "Total.Non.Passenger.Aeronautical.Revenue", "Total.Non.Aeronautical.Revenue") %>% melt(id = c("LOC_ID","YEAR"))
# Rename Variables
Finance_Selected <- Finance_Selected %>%
mutate(variable = recode(variable,
Total.Passenger.Airline.Aeronautical.Revenue = 'Aeronautical - Passenger Airline',
Total.Non.Passenger.Aeronautical.Revenue = 'Aeronautical - Non-passenger ',
Total.Non.Aeronautical.Revenue = 'Non-Aeronautical' ))
# Plot
colors = c("#2B4570", "#A8D0DB", "#D5B0AC")
fig_ANC <- plot_ly(Finance_Selected %>% filter(LOC_ID == "ANC"), x = ~YEAR, y = ~value,
color = ~variable, type = "bar", colors = colors, showlegend=FALSE) %>%
layout(xaxis = list(title = 'Anchorage'), yaxis = list(title='Total Revenue ($)'), barmode = 'stack')
fig_GUM <- plot_ly(Finance_Selected %>% filter(LOC_ID == "GUM"), x = ~YEAR, y = ~value,
color = ~variable, type = "bar", colors = colors, showlegend=FALSE) %>%
layout(xaxis = list(title = 'Guam'), barmode = 'stack')
fig_HNL <- plot_ly(Finance_Selected %>% filter(LOC_ID == "HNL"), x = ~YEAR, y = ~value,
color = ~variable, type = "bar", colors = colors, showlegend=FALSE) %>%
layout(xaxis = list(title = 'Honolulu'), barmode = 'stack')
fig_PR <- plot_ly(Finance_Selected %>% filter(LOC_ID == "SJU"), x = ~YEAR, y = ~value,
color = ~variable, type = "bar", colors = colors) %>%
layout(xaxis = list(title = 'Puerto Rico'), barmode = 'stack')
fig_combine <- subplot(fig_ANC, fig_GUM, fig_HNL, fig_PR, shareY = TRUE, titleX = TRUE) %>%
layout(title = list(text = "Breakdown of Total Revenue ($) at 4 Selected Airports"), margin=0.1)
fig_combine
Given the results from the first section, I hope to take a closer look at Anchorage’s non-passenger aeronautical revenue. As speculated, large portions come from cargo landing fees and fuel sales net profit or fuel flowage fees. While the emphasis on cargo makes Anchorage airport more resilient to COVID impacts than many other passenger-centered airports, the airport may face potential threats and risks from fuel price fluctuations.
# Select Revenue Variables
ANC_Non_Pass_Aero <- ANC %>% filter(YEAR== 2022) %>%
select("LOC_ID", "YEAR", "Landing.fees.from.cargo",
"Landing.fees.from.GA.and.military", "X.FBO.revenue...contract.or.sponsor.operated",
"Cargo.and.hangar.rentals", "Aviation.fuel.tax.retained.for.airport.use",
"Fuel.sales.net.profit.loss.or.fuel.flowage.fees", "Security.reimbursement.from.Federal.Government",
"Other.non.passenger.aeronautical.revenue") %>% melt(id = c("LOC_ID","YEAR"))
ANC_Non_Pass_Aero_rev <- filter_if(ANC_Non_Pass_Aero, is.numeric, all_vars((.) != 0))
ANC_Non_Pass_Aero_rev$variable <- factor(ANC_Non_Pass_Aero_rev$variable)
# Rename
ANC_Non_Pass_Aero_rev <- ANC_Non_Pass_Aero_rev %>%
mutate(variable = recode(variable,
Fuel.sales.net.profit.loss.or.fuel.flowage.fees = 'Fuel sales net profit/loss or fuel flowage fees',
Landing.fees.from.cargo = 'Landing Fees from Cargo',
Cargo.and.hangar.rentals = 'Cargo and Hangar Rentals'))
# Plot
fig0 <- ANC_Non_Pass_Aero_rev %>% plot_ly(labels = ~variable, values = ~value)
fig0 <- fig0 %>% add_pie(hole = 0.4, marker = list(colors = colors_3))
fig0 <- fig0 %>% layout(title = "Anchorage INTL - Non-Passenger Aeronautic Revenue Breakdown (2022)",
xaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE),
yaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE),
legend = list(
title = "Variable"
),
categoryorder = "category ascending")
fig0
Similar to the previous section, I want to dig deeper in Guam and Puerto Rico’s non-aeronautical revenue breakdown in 2022, which contributes a significant share of their total revenue. Both charts indicate that the airports generate revenue from terminal services, including retail and duty free, food and beverages, and other services. In addition, both Guam and PR airport generates a large amount of revenue from badges, fingerprinting, operating cost recovery, and receiving local ARP funds. Luis Munoz Martin airport also generates income from rental car services and parking, while Guam gets land and non-terminal facility leases and revenues.
Though not shown in graphs, in comparison to pre-COVID revenue breakdown, the ARP funds for COVID makes a big difference when it comes to proportion of revenue breakdown and bumps up the overall revenue in 2022. This may also justify the charts in the following section on expenses – there isn’t a significant drop in expenses at those two airports. However, reliance on ARP funding is not long-term, and the financial performance over the next few years may show more aftereffects and demonstrates the real capability of airports’ capacity to recover.
# Select Relevant Variables
GUM_Non_Aero_rev <- GUM %>% filter(YEAR== 2022) %>%
select("LOC_ID", "YEAR", "Land.and.non.terminal.facility.leases.and.revenues",
"Terminal.food.and.beverage", "Terminal.retail.stores.and.duty.free",
"Terminal.services.and.other", "Rental.cars.excludes.customer.facility.charges",
"Parking.and.ground.transportation", "Hotel", "Other.Non.Aeronautical.Revenue") %>% melt(id = c("LOC_ID","YEAR"))
SJU_Non_Aero_rev <- SJU %>% filter(YEAR== 2022) %>%
select("LOC_ID", "YEAR", "Land.and.non.terminal.facility.leases.and.revenues",
"Terminal.food.and.beverage", "Terminal.retail.stores.and.duty.free",
"Terminal.services.and.other", "Rental.cars.excludes.customer.facility.charges",
"Parking.and.ground.transportation", "Hotel", "Other.Non.Aeronautical.Revenue") %>% melt(id = c("LOC_ID","YEAR"))
# Rename Variables
GUM_Non_Aero_rev <- GUM_Non_Aero_rev %>%
mutate(variable = recode(variable,
Land.and.non.terminal.facility.leases.and.revenues = 'Land and Non-terminal Facility Leases and Revenues',
Terminal.food.and.beverage = 'Terminal - Food & Beverage',
Terminal.retail.stores.and.duty.free = 'Terminal - Retail & Duty Free',
Terminal.services.and.other = 'Terminal - Services & Other',
Rental.cars.excludes.customer.facility.charges = 'Rental Cars (Excludes Customer Facility Charges',
Parking.and.ground.transportation = 'Parking and Ground Transportation',
Hotel = 'Hotel',
Other.Non.Aeronautical.Revenue = 'Other Non-aeronautical' ))
SJU_Non_Aero_rev <- SJU_Non_Aero_rev %>%
mutate(variable = recode(variable,
Land.and.non.terminal.facility.leases.and.revenues = 'Land and Non-terminal Facility Leases and Revenues',
Terminal.food.and.beverage = 'Terminal - Food & Beverage',
Terminal.retail.stores.and.duty.free = 'Terminal - Retail & Duty Free',
Terminal.services.and.other = 'Terminal - Services & Other',
Rental.cars.excludes.customer.facility.charges = 'Rental Cars (Excludes Customer Facility Charges',
Parking.and.ground.transportation = 'Parking and Ground Transportation',
Hotel = 'Hotel',
Other.Non.Aeronautical.Revenue = 'Other Non-aeronautical' ))
# Plot Guam
GUM_Non_Aero_rev$variable <- factor(GUM_Non_Aero_rev$variable)
fig1 <- GUM_Non_Aero_rev %>% plot_ly(labels = ~variable, values = ~value)
fig1 <- fig1 %>% add_pie(hole = 0.4, marker = list(colors = colors_8))
fig1 <- fig1 %>% layout(title = "Guam INTL",
xaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE),
yaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE),
legend = list(
title = "Variable",
orientation = "v"
),
categoryorder = "category ascending")
fig1
SJU_Non_Aero_rev$variable <- factor(SJU_Non_Aero_rev$variable)
fig2 <- SJU_Non_Aero_rev %>% plot_ly(labels = ~variable, values = ~value)
fig2 <- fig2 %>% add_pie(hole = 0.4, marker = list(colors = colors_8))
fig2 <- fig2 %>% layout(title = "Luis Munoz Marin INTL (PR)",
xaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE),
yaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE),
legend = list(
title = "Variable",
orientation = "v"
),
categoryorder = "category ascending")
fig2
Operational Expenses and Capital Expenses contribute to different aspects of airport operations and developments. The following graph shows the breakdown of those two types of expenses at selected airport since 2019. Interestingly, the operation expenses doesn’t seem to fluctuate a lot, yet considering the downturn of revenue in 2020 and 2021 as shown in previous sections, keeping the same level of operating expenses might have put on more burden on the airport finance.
At the same time, cap ex doesn’t seem to be impacted as much as one would imagine, except for Honolulu airport, whose cap ex halved in 2022 in comparison to the previous year. HNL is also the only airport among the selected airports that have higher capital expenses than operational expenses. On the flip side, Guam Airport’s cap expenses and operational expenses are very close. Anchorage, unlike others, even increased their cap expense in 2022. Generally speaking, capital expenses usually shows a lag in societal impacts comparing to operational expenses, given the financial budgeting schedule runs on an annual basis and has lower volatility.
The unqiue characteristics of each of the non-contiguous airports may be explained by their drastically different financing and operating strategies and different economic emphasis of cities they are located at – Anchorage on cargo, Hawaii and Puerto Rico as influential tourist destination, and Guam both as a tourist destinationa and military base.
# Merge Op-ex and Cap-ex
Op_Ex <- Finance_Data %>% select("LOC_ID", "YEAR", "Total.Operating.Expenses")
Op_Ex_wide <- pivot_wider(Op_Ex, names_from = "LOC_ID", values_from ="Total.Operating.Expenses")
Op_Ex_wide$YEAR = factor(Op_Ex_wide$YEAR)
names(Op_Ex_wide)[names(Op_Ex_wide) == "ANC"] <- "ANC_Op_Ex"
names(Op_Ex_wide)[names(Op_Ex_wide) == "GUM"] <- "GUM_Op_Ex"
names(Op_Ex_wide)[names(Op_Ex_wide) == "HNL"] <- "HNL_Op_Ex"
names(Op_Ex_wide)[names(Op_Ex_wide) == "SJU"] <- "SJU_Op_Ex"
Cap_Ex <- Finance_Data %>% select("LOC_ID", "YEAR", "Total.Capital.Expenditures")
Cap_Ex_wide <- pivot_wider(Cap_Ex, names_from = "LOC_ID", values_from ="Total.Capital.Expenditures")
Cap_Ex_wide$YEAR = factor(Cap_Ex_wide$YEAR)
names(Cap_Ex_wide)[names(Cap_Ex_wide) == "ANC"] <- "ANC_Cap_Ex"
names(Cap_Ex_wide)[names(Cap_Ex_wide) == "GUM"] <- "GUM_Cap_Ex"
names(Cap_Ex_wide)[names(Cap_Ex_wide) == "HNL"] <- "HNL_Cap_Ex"
names(Cap_Ex_wide)[names(Cap_Ex_wide) == "SJU"] <- "SJU_Cap_Ex"
Expense <- merge(Op_Ex_wide, Cap_Ex_wide, by="YEAR")
# Plot
fig_Ex <- plot_ly(Expense, x = ~YEAR, y = ~ANC_Op_Ex, name = 'ANC (Op-Ex)', type="scatter",
mode = 'lines+markers') %>%
layout(xaxis = list(title = 'Year'), yaxis = list(title='Expenses ($)'), title = list(text = "Operational and Capital Expenses 2019-2022"), margin=0.1)
fig_Ex <- fig_Ex %>% add_trace(y = ~ANC_Cap_Ex, name = 'ANC (Cap-Ex)',
line = list(color = '#A8D0DB', dash = 'dash'), marker = list(color = '#A8D0DB'))
fig_Ex <- fig_Ex %>% add_trace(y = ~GUM_Op_Ex, name = 'GUM (Op-Ex)',
line = list(color = '#832f0e'), marker = list(color = '#832f0e'))
fig_Ex <- fig_Ex %>% add_trace(y = ~GUM_Cap_Ex, name = 'GUM (Cap-Ex)',
line = list(color = '#e3c2a0', dash = 'dash'), marker = list(color = '#e3c2a0'))
fig_Ex <- fig_Ex %>% add_trace(y = ~HNL_Op_Ex, name = 'HNL INTL (Op-Ex)',
line = list(color = '#594a40'), marker = list(color = '#594a40'))
fig_Ex <- fig_Ex %>% add_trace(y = ~HNL_Cap_Ex, name = 'HNL INTL (Cap-Ex)',
line = list(color = '#8e7967', dash = 'dash'), marker = list(color = '#8e7967'))
fig_Ex <- fig_Ex %>% add_trace(y = ~SJU_Op_Ex, name = 'SJU (Op-Ex)',
line = list(color = '#81947a'), marker = list(color = '#81947a'))
fig_Ex <- fig_Ex %>% add_trace(y = ~SJU_Cap_Ex, name = 'SJU (Cap-Ex)',
line = list(color = '#d2d2cb', dash = 'dash'), marker = list(color = '#d2d2cb'))
fig_Ex
The last section aims to offer an overview of the selected airports’ beyond revenue and expenses – ‘attractiveness’ to all important stakeholders airlines, passengers, cargo carriers, etc. The following chart reveals 6 indicators that contribute to generating insights, including enplanements, landing weights, landing fee (per 1000 lbs), annual aircraft operations, passenger cost, and marketing expenses. Some indicators may have positive correlation with the overall attractiveness, yet others, such as high passenger cost per enplanements or marketing expenses, may indicate a lack of present attractiveness. At the same time, there could be co-linearity between two or more indicators – for instance, higher enplanements may align with higher landing fees, showing that some airports, given their popularity, want to profit from the large number of flights it welcomes.
There are many interesting observations that can be made from this compiled charts. Two that pops out the most are 1) Guam and Anchorage’s marketing expenses (and lack of marketing expenses at HNL and SJU) and 2) Guam’s shocking annual landing weights.
In comparison to previous years’ statistics, Guam airport has doubled its marketing/promotion expenses in 2022 ($189,101 in comparison to $72,845 in 2021) while Anchorage increase by more than 4 times from $17,098 in 2021 to $76,123. Although how this expense was specifically deployed is unknown, this potentially shows the desperation of two airports to promote themselves, especially in comparison to well-established destinations such as Hawaii or Puerto Rico, who are not deploying any dollar on such item.
Guam’s shocking annual landing weights, on the other hand, is harder to find an explicit justification except for the island’s geopolitical importance and its role as a critical military base.
# Draw Relevant Data
Popularity <- Finance_Data %>% filter(YEAR== 2022) %>%
select(LOC_ID, YEAR, Enplanements, Landed.weights.in.pounds, Signatory.landing.fee.rate.per.1000.lbs,
Annual.aircraft.operations, Passenger.airline.cost.per.enplanement, Marketing...Advertising...Promotions)
fig_a <- Popularity %>% plot_ly(x = ~LOC_ID, y = ~Enplanements, type = 'bar', color = ~LOC_ID, colors=colors_4) %>%
layout(xaxis = list(title = 'Enplanements'))
fig_b <- Popularity %>% plot_ly(x = ~LOC_ID, y = ~Landed.weights.in.pounds, type = 'bar', color = ~LOC_ID, colors=colors_4) %>%
layout(xaxis = list(title = 'Landing Weights (Pounds)'))
fig_c <- Popularity %>% plot_ly(x = ~LOC_ID, y = ~Signatory.landing.fee.rate.per.1000.lbs, color = ~LOC_ID, colors=colors_4) %>%
layout(xaxis = list(title = 'Landing Fee per 1000 lbs'))
fig_d <- Popularity %>% plot_ly(x = ~LOC_ID, y = ~Annual.aircraft.operations, color = ~LOC_ID, colors=colors_4) %>%
layout(xaxis = list(title = 'Annual Aircraft Operations'))
fig_e <- Popularity %>% plot_ly(x = ~LOC_ID, y = ~Passenger.airline.cost.per.enplanement, color = ~LOC_ID, colors=colors_4) %>%
layout(xaxis = list(title = 'Passenger Cost per Enplanements'))
fig_f <- Popularity %>% plot_ly(x = ~LOC_ID, y = ~Marketing...Advertising...Promotions, color = ~LOC_ID, colors=colors_4) %>%
layout(xaxis = list(title = 'Marketing Expenses'))
fig <- subplot(fig_a, fig_b, fig_c, fig_d, fig_e, fig_f, nrows=3, margin=0.08, titleX=TRUE) %>%
layout(showlegend=FALSE, title = list(text = "6 Indicators - Visitors, Landing Weights, and Marketing"),
margin=0.1, height=1000)
fig